Working with the GCDF Dataset:

A New User’s Perspective

Teal Emery

Agenda

  1. Working with the GCDF data as a new user: joys, pain points, and ideas for marginal improvements.
  2. chinadevfin2 - an R package for working with the GCDF 2.0 dataset.
  3. Group discussion.

1. Working with the GCDF data as a new user

Context

How I became a connoisseur of large international datasets…

  1. Private sector
  2. Policy research
  3. Teaching

Overall Assessment

The GCDF 2.0 is a well organized dataset with A LOT of amazing data. The methodology and variables are well documented. The pain points I’ve experienced working with it are common to most large datasets:

  1. Getting the data ready for analysis is time consuming.
  2. Country names are not presented in a standardized format.

Three Easy Improvements

You work hard to create these datasets. What are some easy-to-implement changes to enhance their use?

  1. Release the dataset as a .csv file in addition to the .xlsx to make it easier to import into data analysis software.
  2. Add country ISO3C codes for recipient countries to make it easier to combine the GCFD data with complementary datasets. [Update: this is done in the new release!]
  3. Create a “cookbook” demonstrating simple examples of how users can use the data to gain insights, and answers to common challenges. This can ease adoption of the dataset.

2. Introducing the chinadevfin2 R Package

chinadevfin2

Caveats

  1. So far, this is a personal project.
  2. It is a minimum viable product.
  3. It won’t take too much effort to adapt this to the new GCDF 3.0 dataset.

Why make an R package?

Work hard to be lazy.

Let’s Explore

  1. chinadevfin2 R package website
  2. Getting Started tutorial
  3. A simple interactive web app built using chinadevfin2

Benefits of R:

The cost of using R is the learning curve. But there are amazing free learning materials online.

  • Open Source. R is free, and there is a vibrant developer community constantly adding functionality.

  • Work hard to be lazy: if you do anything more than once, make a function.

  • Communication tools: Quarto and Shiny. This presentation was made in R, and so was my personal website.

  • Automation: GitHub Actions & parameterized reporting.

3. Group Discussion